CADERNOS DE COMPUTAÇÃO XX (2003) Learning with Skewed Class Distributions

نویسندگان

  • Maria Carolina Monard
  • Gustavo E. A. P. A. Batista
چکیده

Several aspects may influence the performance achieved by a classifier created by a Machine Learning system. One of these aspects is related to the difference between the numbers of examples belonging to each class. When this difference is large, the learning system may have difficulties to learn the concept related to the minority class. In this work, we discuss several issues related to learning with skewed class distributions, such as the relationship between cost-sensitive learning and class distributions, and the limitations of accuracy and error rate to measure the performance of classifiers. Also, we survey some methods proposed by the Machine Learning community to solve the problem of learning with imbalanced data sets, and discuss some limitations of these methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Rule Sets to Maximize ROC Performance

Rules are commonly used for classification because they are modular, intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limitations of classification accuracy: when class distributions are skewed, or error costs are unequa...

متن کامل

Toward Scalable Learning with Non-uniform Distributions: E ects and a Multi-classi er Approach

Many factors innuence the performance of a learned classiier. In this paper we study diier-ent methods of measuring performance based on a uniied set of cost models and the eeects of training class distribution with respect to these models. Observations from these eeects help us devise a distributed multi-classiier meta-learning approach to learn in domains with skewed class distributions, non-...

متن کامل

Data mining with imbalanced class distributions: concepts and methods

Some real world data mining applications present imbalanced or skewed class distributions. In these domains, the underrepresented classes are often the ones we are more interested in. However, most learning algorithms are not able to induce meaningful classifiers in some imbalanced domains. One reason for this poor performance is that learning algorithms tend to focus in abundant classes to max...

متن کامل

Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection

Very large databases with skewed class distributions and non-unlform cost per error are not uncommon in real-world data mining tasks. We devised a multi-classifier meta-learning approach to address these three issues. Our empirical results from a credit card fraud detection task indicate that the approach can significantly reduce loss due to illegitimate transactions.

متن کامل

Bayesian Multivariate Regression Analysis with a New Class of Skewed Distributions

In this paper, we introduce a novel class of skewed multivariate distributions and, more generally, a method of building such a class on the basis of univariate skewed distributions. The method is based on a general linear transformation of a multidimensional random variable with independent components, each with a skewed distribution. Our proposed class of multivariate skewed distributions has...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002